Skip to content

feat(slirp): slirp_crr tracing target for CRR-path investigation#82

Closed
dpsoft wants to merge 1 commit intomainfrom
slirp-crr-instrument
Closed

feat(slirp): slirp_crr tracing target for CRR-path investigation#82
dpsoft wants to merge 1 commit intomainfrom
slirp-crr-instrument

Conversation

@dpsoft
Copy link
Copy Markdown
Contributor

@dpsoft dpsoft commented May 6, 2026

Summary

Four tracing::trace! call sites under target slirp_crr that emit a timeline of one TCP CRR cycle through voidbox. Filtered out by default; cost is one atomic-load per call site when RUST_LOG=slirp_crr=trace is unset.

Used to root-cause the voidbox-vs-pasta CRR p50 gap (122× from PR #81's harness): the gap is dominated by guest-side nc fork+exec per CRR iteration, not voidbox's NAT path. Trace evidence in the PR description below.

What it traces

Call site What it tells you
process_guest_frame exit (when ≥1 frame queued) guest→host packet processed; how many frames it queued; whether epoll_waker.wake() fired
SYN-ACK queued in handle_tcp_syn_outbound (immediate-connect path) timestamp of SYN-ACK emit so the wait between guest SYN and SYN-ACK delivery is visible
drain_to_guest exit (when ≥1 frame drained) net-poll thread picked up the inject queue
net_poll_thread: epoll_wait return + try_inject_rx + IRQ raise the actual wait time per cycle, with raw vs visible event counts

Activation

RUST_LOG=warn,slirp_crr=trace target/release/voidbox-network-bench --iterations 1 \
    --output /tmp/voidbox-bench.json 2> /tmp/crr-trace.log

Then grep for slirp_crr and post-process for inter-event gaps.

Findings from the first run (CRR p50 ≈ 10 ms)

Hypothesis Verdict
5 ms ACTIVE_TIMEOUT cadence in net_poll_thread is the floor Falsified. epoll_wait returns within 0.05–3.7 ms; cadence isn't bitten.
Wake-on-push missing on hot paths Falsified. process_guest_frame already wakes when frames are queued.
HashMap/cache pressure on connect path Unlikely. PMU shows tiny instruction volume during CRR; 30 misses/1K is high but on a small denominator.
Linux delayed-ACK timer (40 ms) Falsified. No 40 ms gaps; rhythms are 3.5–8.6 ms.
Guest-side fork+exec of nc per iteration Confirmed. 8.6 ms inter-event rhythm matches per-iteration fork cost. RR latency without fork is 2 µs.

The trace shows the net-poll thread waiting 3.5–3.7 ms between events, in a steady rhythm of one cycle every ~8.6 ms during the CRR phase. That gap is not voidbox sleeping past a deadline — it's voidbox waiting for the next packet from the guest. Each iteration spawns a fresh nc; busybox-nc fork+exec inside the guest dominates per-iteration cost.

Cost when not activated

tracing::trace! with target slirp_crr is an atomic-load when no subscriber filters it in. Verified: the bench numbers from PR #81's harness were taken with RUST_LOG=info; identical to the pre-instrumentation run within bench noise.

Test plan

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo test --test network_baseline -- --test-threads=1 — 18/18
  • RUST_LOG=warn,slirp_crr=trace voidbox-network-bench --iterations 1 produces parseable trace lines
  • CI

Follow-up (not in this PR)

A guest-side static binary that runs N TCP CRRs in a single process — eliminates the fork+exec contribution and isolates voidbox's NAT-path CRR cost so we can size the real gap to pasta. Tracked separately.

Adds 4 tracing::trace! call sites under target slirp_crr that emit a
timeline of one TCP CRR cycle through voidbox:

  - process_guest_frame: when a guest frame queues ≥1 outbound frame
    + epoll_waker.wake() fires.
  - SYN-ACK queued in handle_tcp_syn_outbound (immediate-connect path).
  - drain_to_guest: when ≥1 frame is drained to the net-poll thread.
  - net_poll_thread: epoll_wait return (with raw_events / wait_us /
    timeout_ms) and try_inject_rx + IRQ raise.

Filtered out by default; cost is one atomic-load per call site when
RUST_LOG=slirp_crr=trace is unset.  Activate with:

    RUST_LOG=warn,slirp_crr=trace target/release/voidbox-network-bench ...

Used to root-cause the voidbox-vs-pasta CRR p50 gap: 122x is dominated
by guest-side fork+exec of busybox-nc per CRR iteration (8.6ms inter-
event rhythm in the trace), not the voidbox NAT path itself (per-event
work is microsecond-scale).  Keeping the instrumentation in tree so
the next investigation doesn't have to rebuild it.
@dpsoft dpsoft closed this May 6, 2026
@dpsoft dpsoft deleted the slirp-crr-instrument branch May 6, 2026 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant